Overview
Brought to you by YData
Dataset statistics
| Dataset A | Dataset B | |
|---|---|---|
| Number of variables | 12 | 12 |
| Number of observations | 446 | 446 |
| Missing cells | 434 | 442 |
| Missing cells (%) | 8.1% | 8.3% |
| Duplicate rows | 0 | 0 |
| Duplicate rows (%) | 0.0% | 0.0% |
| Total size in memory | 45.3 KiB | 45.3 KiB |
| Average record size in memory | 104.0 B | 104.0 B |
Variable types
| Dataset A | Dataset B | |
|---|---|---|
| Numeric | 5 | 5 |
| Categorical | 4 | 4 |
| Text | 3 | 3 |
| Dataset A | Dataset B | |
|---|---|---|
Sex is highly overall correlated with Survived | Alert not present in this dataset | High correlation |
Survived is highly overall correlated with Sex | Alert not present in this dataset | High correlation |
Age has 87 (19.5%) missing values | Age has 87 (19.5%) missing values | Missing |
Cabin has 345 (77.4%) missing values | Cabin has 354 (79.4%) missing values | Missing |
PassengerId has unique values | PassengerId has unique values | Unique |
Name has unique values | Name has unique values | Unique |
SibSp has 304 (68.2%) zeros | SibSp has 296 (66.4%) zeros | Zeros |
Parch has 337 (75.6%) zeros | Parch has 341 (76.5%) zeros | Zeros |
Fare has 6 (1.3%) zeros | Fare has 9 (2.0%) zeros | Zeros |
| Alert not present in this dataset | Fare is highly overall correlated with SibSp | High correlation |
| Alert not present in this dataset | SibSp is highly overall correlated with Fare | High correlation |
Reproduction
| Dataset A | Dataset B | |
|---|---|---|
| Analysis started | 2024-10-16 08:47:10.498563 | 2024-10-16 08:47:13.606847 |
| Analysis finished | 2024-10-16 08:47:13.603553 | 2024-10-16 08:47:16.740179 |
| Duration | 3.1 seconds | 3.13 seconds |
| Software version | ydata-profiling v0.0.dev0 | ydata-profiling v0.0.dev0 |
| Download configuration | config.json | config.json |
Variables
PassengerId
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 446 | 446 |
| Distinct (%) | 100.0% | 100.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 466.20852 | 445.96637 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 1 | 2 |
| Maximum | 891 | 885 |
| Zeros | 0 | 0 |
| Zeros (%) | 0.0% | 0.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 1 | 2 |
| 5-th percentile | 44.25 | 51.25 |
| Q1 | 247.75 | 229.5 |
| median | 478 | 447 |
| Q3 | 694.5 | 656.75 |
| 95-th percentile | 849.25 | 827.5 |
| Maximum | 891 | 885 |
| Range | 890 | 883 |
| Interquartile range (IQR) | 446.75 | 427.25 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 257.54667 | 251.05841 |
| Coefficient of variation (CV) | 0.55242806 | 0.56295368 |
| Kurtosis | -1.18671 | -1.1765655 |
| Mean | 466.20852 | 445.96637 |
| Median Absolute Deviation (MAD) | 222 | 216 |
| Skewness | -0.11634189 | -0.028202166 |
| Sum | 207929 | 198901 |
| Variance | 66330.287 | 63030.325 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 767 | 1 | 0.2% |
| 517 | 1 | 0.2% |
| 692 | 1 | 0.2% |
| 536 | 1 | 0.2% |
| 495 | 1 | 0.2% |
| 213 | 1 | 0.2% |
| 399 | 1 | 0.2% |
| 818 | 1 | 0.2% |
| 886 | 1 | 0.2% |
| 108 | 1 | 0.2% |
| Other values (436) | 436 |
| Value | Count | Frequency (%) |
| 738 | 1 | 0.2% |
| 157 | 1 | 0.2% |
| 222 | 1 | 0.2% |
| 773 | 1 | 0.2% |
| 875 | 1 | 0.2% |
| 386 | 1 | 0.2% |
| 696 | 1 | 0.2% |
| 571 | 1 | 0.2% |
| 788 | 1 | 0.2% |
| 594 | 1 | 0.2% |
| Other values (436) | 436 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 2 | 1 | |
| 6 | 1 | |
| 8 | 1 | |
| 9 | 1 | |
| 10 | 1 | |
| 11 | 1 | |
| 12 | 1 | |
| 14 | 1 | |
| 15 | 1 |
| Value | Count | Frequency (%) |
| 2 | 1 | |
| 3 | 1 | |
| 8 | 1 | |
| 9 | 1 | |
| 10 | 1 | |
| 12 | 1 | |
| 13 | 1 | |
| 16 | 1 | |
| 18 | 1 | |
| 22 | 1 |
| Value | Count | Frequency (%) |
| 2 | 1 | |
| 3 | 1 | |
| 8 | 1 | |
| 9 | 1 | |
| 10 | 1 | |
| 12 | 1 | |
| 13 | 1 | |
| 16 | 1 | |
| 18 | 1 | |
| 22 | 1 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 2 | 1 | |
| 6 | 1 | |
| 8 | 1 | |
| 9 | 1 | |
| 10 | 1 | |
| 11 | 1 | |
| 12 | 1 | |
| 14 | 1 | |
| 15 | 1 |
Survived
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 2 | 2 |
| Distinct (%) | 0.4% | 0.4% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| 0 | |
|---|---|
| 1 |
| 0 | |
|---|---|
| 1 |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 1 | 1 |
| 2nd row | 1 | 0 |
| 3rd row | 1 | 0 |
| 4th row | 0 | 1 |
| 5th row | 0 | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 279 | |
| 1 | 167 |
| Value | Count | Frequency (%) |
| 0 | 282 | |
| 1 | 164 |
Length
Histogram of lengths of the category
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| 0 | 279 | |
| 1 | 167 |
| Value | Count | Frequency (%) |
| 0 | 282 | |
| 1 | 164 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 279 | |
| 1 | 167 |
| Value | Count | Frequency (%) |
| 0 | 282 | |
| 1 | 164 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 279 | |
| 1 | 167 |
| Value | Count | Frequency (%) |
| 0 | 282 | |
| 1 | 164 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 279 | |
| 1 | 167 |
| Value | Count | Frequency (%) |
| 0 | 282 | |
| 1 | 164 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 279 | |
| 1 | 167 |
| Value | Count | Frequency (%) |
| 0 | 282 | |
| 1 | 164 |
Pclass
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 3 | 3 |
| Distinct (%) | 0.7% | 0.7% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| 3 | |
|---|---|
| 1 | |
| 2 |
| 3 | |
|---|---|
| 1 | |
| 2 |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 2 | 3 |
| 2nd row | 3 | 2 |
| 3rd row | 2 | 2 |
| 4th row | 3 | 2 |
| 5th row | 3 | 2 |
Common Values
| Value | Count | Frequency (%) |
| 3 | 251 | |
| 1 | 109 | |
| 2 | 86 | 19.3% |
| Value | Count | Frequency (%) |
| 3 | 256 | |
| 1 | 104 | |
| 2 | 86 | 19.3% |
Length
Histogram of lengths of the category
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| 3 | 251 | |
| 1 | 109 | |
| 2 | 86 | 19.3% |
| Value | Count | Frequency (%) |
| 3 | 256 | |
| 1 | 104 | |
| 2 | 86 | 19.3% |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 251 | |
| 1 | 109 | |
| 2 | 86 | 19.3% |
| Value | Count | Frequency (%) |
| 3 | 256 | |
| 1 | 104 | |
| 2 | 86 | 19.3% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 3 | 251 | |
| 1 | 109 | |
| 2 | 86 | 19.3% |
| Value | Count | Frequency (%) |
| 3 | 256 | |
| 1 | 104 | |
| 2 | 86 | 19.3% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 3 | 251 | |
| 1 | 109 | |
| 2 | 86 | 19.3% |
| Value | Count | Frequency (%) |
| 3 | 256 | |
| 1 | 104 | |
| 2 | 86 | 19.3% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 3 | 251 | |
| 1 | 109 | |
| 2 | 86 | 19.3% |
| Value | Count | Frequency (%) |
| 3 | 256 | |
| 1 | 104 | |
| 2 | 86 | 19.3% |
Name
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 446 | 446 |
| Distinct (%) | 100.0% | 100.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 82 | 57 |
| Median length | 49 | 48 |
| Mean length | 26.975336 | 27.152466 |
| Min length | 12 | 12 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 446 | 446 ? |
| Unique (%) | 100.0% | 100.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | Lemore, Mrs. (Amelia Milley) | Gilnagh, Miss. Katherine "Katie" |
| 2nd row | Karun, Miss. Manca | Bracken, Mr. James H |
| 3rd row | Hart, Miss. Eva Miriam | Mack, Mrs. (Mary) |
| 4th row | Stanley, Mr. Edward Roland | Abelson, Mrs. Samuel (Hannah Wizosky) |
| 5th row | Perkin, Mr. John Henry | Davies, Mr. Charles Henry |
| Value | Count | Frequency (%) |
| mr | 261 | 14.3% |
| miss | 94 | 5.2% |
| mrs | 60 | 3.3% |
| william | 36 | 2.0% |
| john | 23 | 1.3% |
| master | 20 | 1.1% |
| henry | 18 | 1.0% |
| james | 14 | 0.8% |
| george | 13 | 0.7% |
| mary | 13 | 0.7% |
| Other values (897) | 1267 |
| Value | Count | Frequency (%) |
| mr | 268 | 14.7% |
| miss | 83 | 4.6% |
| mrs | 68 | 3.7% |
| william | 30 | 1.6% |
| master | 23 | 1.3% |
| henry | 22 | 1.2% |
| john | 20 | 1.1% |
| charles | 13 | 0.7% |
| james | 13 | 0.7% |
| mary | 12 | 0.7% |
| Other values (892) | 1270 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1375 | 11.4% | |
| r | 963 | 8.0% |
| e | 839 | 7.0% |
| a | 796 | 6.6% |
| n | 671 | 5.6% |
| i | 663 | 5.5% |
| s | 651 | 5.4% |
| M | 553 | 4.6% |
| l | 536 | 4.5% |
| o | 518 | 4.3% |
| Other values (49) | 4466 |
| Value | Count | Frequency (%) |
| 1378 | 11.4% | |
| r | 982 | 8.1% |
| a | 856 | 7.1% |
| e | 849 | 7.0% |
| n | 670 | 5.5% |
| i | 645 | 5.3% |
| s | 642 | 5.3% |
| M | 568 | 4.7% |
| l | 535 | 4.4% |
| o | 488 | 4.0% |
| Other values (48) | 4497 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 12031 |
| Value | Count | Frequency (%) |
| (unknown) | 12110 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 1375 | 11.4% | |
| r | 963 | 8.0% |
| e | 839 | 7.0% |
| a | 796 | 6.6% |
| n | 671 | 5.6% |
| i | 663 | 5.5% |
| s | 651 | 5.4% |
| M | 553 | 4.6% |
| l | 536 | 4.5% |
| o | 518 | 4.3% |
| Other values (49) | 4466 |
| Value | Count | Frequency (%) |
| 1378 | 11.4% | |
| r | 982 | 8.1% |
| a | 856 | 7.1% |
| e | 849 | 7.0% |
| n | 670 | 5.5% |
| i | 645 | 5.3% |
| s | 642 | 5.3% |
| M | 568 | 4.7% |
| l | 535 | 4.4% |
| o | 488 | 4.0% |
| Other values (48) | 4497 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 12031 |
| Value | Count | Frequency (%) |
| (unknown) | 12110 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 1375 | 11.4% | |
| r | 963 | 8.0% |
| e | 839 | 7.0% |
| a | 796 | 6.6% |
| n | 671 | 5.6% |
| i | 663 | 5.5% |
| s | 651 | 5.4% |
| M | 553 | 4.6% |
| l | 536 | 4.5% |
| o | 518 | 4.3% |
| Other values (49) | 4466 |
| Value | Count | Frequency (%) |
| 1378 | 11.4% | |
| r | 982 | 8.1% |
| a | 856 | 7.1% |
| e | 849 | 7.0% |
| n | 670 | 5.5% |
| i | 645 | 5.3% |
| s | 642 | 5.3% |
| M | 568 | 4.7% |
| l | 535 | 4.4% |
| o | 488 | 4.0% |
| Other values (48) | 4497 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 12031 |
| Value | Count | Frequency (%) |
| (unknown) | 12110 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 1375 | 11.4% | |
| r | 963 | 8.0% |
| e | 839 | 7.0% |
| a | 796 | 6.6% |
| n | 671 | 5.6% |
| i | 663 | 5.5% |
| s | 651 | 5.4% |
| M | 553 | 4.6% |
| l | 536 | 4.5% |
| o | 518 | 4.3% |
| Other values (49) | 4466 |
| Value | Count | Frequency (%) |
| 1378 | 11.4% | |
| r | 982 | 8.1% |
| a | 856 | 7.1% |
| e | 849 | 7.0% |
| n | 670 | 5.5% |
| i | 645 | 5.3% |
| s | 642 | 5.3% |
| M | 568 | 4.7% |
| l | 535 | 4.4% |
| o | 488 | 4.0% |
| Other values (48) | 4497 |
Sex
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 2 | 2 |
| Distinct (%) | 0.4% | 0.4% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| male | |
|---|---|
| female |
| male | |
|---|---|
| female |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 6 | 6 |
| Median length | 4 | 4 |
| Mean length | 4.690583 | 4.6816143 |
| Min length | 4 | 4 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | female | female |
| 2nd row | female | male |
| 3rd row | female | female |
| 4th row | male | female |
| 5th row | male | male |
Common Values
| Value | Count | Frequency (%) |
| male | 292 | |
| female | 154 |
| Value | Count | Frequency (%) |
| male | 294 | |
| female | 152 |
Length
Histogram of lengths of the category
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| male | 292 | |
| female | 154 |
| Value | Count | Frequency (%) |
| male | 294 | |
| female | 152 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 600 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 154 | 7.4% |
| Value | Count | Frequency (%) |
| e | 598 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 152 | 7.3% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 2092 |
| Value | Count | Frequency (%) |
| (unknown) | 2088 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 600 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 154 | 7.4% |
| Value | Count | Frequency (%) |
| e | 598 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 152 | 7.3% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 2092 |
| Value | Count | Frequency (%) |
| (unknown) | 2088 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 600 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 154 | 7.4% |
| Value | Count | Frequency (%) |
| e | 598 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 152 | 7.3% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 2092 |
| Value | Count | Frequency (%) |
| (unknown) | 2088 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 600 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 154 | 7.4% |
| Value | Count | Frequency (%) |
| e | 598 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 152 | 7.3% |
Age
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 72 | 81 |
| Distinct (%) | 20.1% | 22.6% |
| Missing | 87 | 87 |
| Missing (%) | 19.5% | 19.5% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 29.646936 | 29.517187 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0.42 | 0.67 |
| Maximum | 74 | 80 |
| Zeros | 0 | 0 |
| Zeros (%) | 0.0% | 0.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0.42 | 0.67 |
| 5-th percentile | 3 | 4 |
| Q1 | 20 | 20 |
| median | 28 | 28 |
| Q3 | 39 | 38 |
| 95-th percentile | 56.1 | 54.1 |
| Maximum | 74 | 80 |
| Range | 73.58 | 79.33 |
| Interquartile range (IQR) | 19 | 18 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 15.164182 | 14.679815 |
| Coefficient of variation (CV) | 0.51149238 | 0.49733109 |
| Kurtosis | -0.090554845 | 0.10730441 |
| Mean | 29.646936 | 29.517187 |
| Median Absolute Deviation (MAD) | 9 | 9 |
| Skewness | 0.31833731 | 0.33728083 |
| Sum | 10643.25 | 10596.67 |
| Variance | 229.95241 | 215.49696 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 25 | 15 | 3.4% |
| 21 | 13 | 2.9% |
| 19 | 13 | 2.9% |
| 26 | 13 | 2.9% |
| 24 | 13 | 2.9% |
| 29 | 11 | 2.5% |
| 18 | 11 | 2.5% |
| 22 | 10 | 2.2% |
| 27 | 10 | 2.2% |
| 30 | 10 | 2.2% |
| Other values (62) | 240 | |
| (Missing) | 87 | 19.5% |
| Value | Count | Frequency (%) |
| 28 | 17 | 3.8% |
| 22 | 14 | 3.1% |
| 30 | 13 | 2.9% |
| 25 | 13 | 2.9% |
| 18 | 13 | 2.9% |
| 24 | 13 | 2.9% |
| 36 | 12 | 2.7% |
| 19 | 11 | 2.5% |
| 21 | 11 | 2.5% |
| 26 | 11 | 2.5% |
| Other values (71) | 231 | |
| (Missing) | 87 | 19.5% |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.75 | 1 | 0.2% |
| 0.83 | 2 | 0.4% |
| 0.92 | 1 | 0.2% |
| 1 | 5 | |
| 2 | 6 | |
| 3 | 5 | |
| 4 | 5 | |
| 5 | 3 | |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0.67 | 1 | 0.2% |
| 0.75 | 1 | 0.2% |
| 0.83 | 1 | 0.2% |
| 0.92 | 1 | 0.2% |
| 1 | 4 | |
| 2 | 5 | |
| 3 | 3 | |
| 4 | 6 | |
| 5 | 3 | |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0.67 | 1 | 0.2% |
| 0.75 | 1 | 0.2% |
| 0.83 | 1 | 0.2% |
| 0.92 | 1 | 0.2% |
| 1 | 4 | |
| 2 | 5 | |
| 3 | 3 | |
| 4 | 6 | |
| 5 | 3 | |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.75 | 1 | 0.2% |
| 0.83 | 2 | 0.4% |
| 0.92 | 1 | 0.2% |
| 1 | 5 | |
| 2 | 6 | |
| 3 | 5 | |
| 4 | 5 | |
| 5 | 3 | |
| 6 | 1 | 0.2% |
SibSp
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 7 | 7 |
| Distinct (%) | 1.6% | 1.6% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 0.52466368 | 0.55829596 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 8 | 8 |
| Zeros | 304 | 296 |
| Zeros (%) | 68.2% | 66.4% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 0 | 0 |
| Q1 | 0 | 0 |
| median | 0 | 0 |
| Q3 | 1 | 1 |
| 95-th percentile | 3 | 3 |
| Maximum | 8 | 8 |
| Range | 8 | 8 |
| Interquartile range (IQR) | 1 | 1 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 1.0864112 | 1.1117193 |
| Coefficient of variation (CV) | 2.0706811 | 1.9912723 |
| Kurtosis | 16.948289 | 15.214249 |
| Mean | 0.52466368 | 0.55829596 |
| Median Absolute Deviation (MAD) | 0 | 0 |
| Skewness | 3.5645831 | 3.3814231 |
| Sum | 234 | 249 |
| Variance | 1.1802892 | 1.2359198 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=7)
| Value | Count | Frequency (%) |
| 0 | 304 | |
| 1 | 103 | 23.1% |
| 2 | 15 | 3.4% |
| 3 | 10 | 2.2% |
| 4 | 8 | 1.8% |
| 5 | 3 | 0.7% |
| 8 | 3 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 296 | |
| 1 | 108 | 24.2% |
| 2 | 16 | 3.6% |
| 4 | 10 | 2.2% |
| 3 | 10 | 2.2% |
| 5 | 3 | 0.7% |
| 8 | 3 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 304 | |
| 1 | 103 | 23.1% |
| 2 | 15 | 3.4% |
| 3 | 10 | 2.2% |
| 4 | 8 | 1.8% |
| 5 | 3 | 0.7% |
| 8 | 3 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 296 | |
| 1 | 108 | 24.2% |
| 2 | 16 | 3.6% |
| 3 | 10 | 2.2% |
| 4 | 10 | 2.2% |
| 5 | 3 | 0.7% |
| 8 | 3 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 296 | |
| 1 | 108 | 24.2% |
| 2 | 16 | 3.6% |
| 3 | 10 | 2.2% |
| 4 | 10 | 2.2% |
| 5 | 3 | 0.7% |
| 8 | 3 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 304 | |
| 1 | 103 | 23.1% |
| 2 | 15 | 3.4% |
| 3 | 10 | 2.2% |
| 4 | 8 | 1.8% |
| 5 | 3 | 0.7% |
| 8 | 3 | 0.7% |
Parch
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 6 | 7 |
| Distinct (%) | 1.3% | 1.6% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 0.39461883 | 0.38789238 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 5 | 6 |
| Zeros | 337 | 341 |
| Zeros (%) | 75.6% | 76.5% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 0 | 0 |
| Q1 | 0 | 0 |
| median | 0 | 0 |
| Q3 | 0 | 0 |
| 95-th percentile | 2 | 2 |
| Maximum | 5 | 6 |
| Range | 5 | 6 |
| Interquartile range (IQR) | 0 | 0 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 0.79682951 | 0.80990306 |
| Coefficient of variation (CV) | 2.0192384 | 2.0879582 |
| Kurtosis | 6.6046872 | 8.9272329 |
| Mean | 0.39461883 | 0.38789238 |
| Median Absolute Deviation (MAD) | 0 | 0 |
| Skewness | 2.3561864 | 2.5974465 |
| Sum | 176 | 173 |
| Variance | 0.63493727 | 0.65594296 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=6)
| Value | Count | Frequency (%) |
| 0 | 337 | |
| 1 | 55 | 12.3% |
| 2 | 47 | 10.5% |
| 3 | 3 | 0.7% |
| 5 | 2 | 0.4% |
| 4 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 341 | |
| 1 | 51 | 11.4% |
| 2 | 47 | 10.5% |
| 3 | 3 | 0.7% |
| 4 | 2 | 0.4% |
| 6 | 1 | 0.2% |
| 5 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 337 | |
| 1 | 55 | 12.3% |
| 2 | 47 | 10.5% |
| 3 | 3 | 0.7% |
| 4 | 2 | 0.4% |
| 5 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 341 | |
| 1 | 51 | 11.4% |
| 2 | 47 | 10.5% |
| 3 | 3 | 0.7% |
| 4 | 2 | 0.4% |
| 5 | 1 | 0.2% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 341 | |
| 1 | 51 | 11.4% |
| 2 | 47 | 10.5% |
| 3 | 3 | 0.7% |
| 4 | 2 | 0.4% |
| 5 | 1 | 0.2% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 337 | |
| 1 | 55 | 12.3% |
| 2 | 47 | 10.5% |
| 3 | 3 | 0.7% |
| 4 | 2 | 0.4% |
| 5 | 2 | 0.4% |
Ticket
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 378 | 380 |
| Distinct (%) | 84.8% | 85.2% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 18 | 18 |
| Median length | 17 | 17 |
| Mean length | 6.7578475 | 6.7892377 |
| Min length | 4 | 4 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 329 | 331 ? |
| Unique (%) | 73.8% | 74.2% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | C.A. 34260 | 35851 |
| 2nd row | 349256 | 220367 |
| 3rd row | F.C.C. 13529 | S.O./P.P. 3 |
| 4th row | A/4 45380 | P/PP 3381 |
| 5th row | A/5 21174 | S.O.C. 14879 |
| Value | Count | Frequency (%) |
| pc | 27 | 4.8% |
| c.a | 12 | 2.1% |
| a/5 | 9 | 1.6% |
| ca | 7 | 1.2% |
| ston/o | 7 | 1.2% |
| 2 | 7 | 1.2% |
| w./c | 5 | 0.9% |
| soton/o.q | 5 | 0.9% |
| f.c.c | 5 | 0.9% |
| 4133 | 4 | 0.7% |
| Other values (398) | 479 |
| Value | Count | Frequency (%) |
| pc | 25 | 4.4% |
| a/5 | 13 | 2.3% |
| c.a | 11 | 1.9% |
| ca | 8 | 1.4% |
| w./c | 6 | 1.1% |
| soton/oq | 5 | 0.9% |
| ston/o | 5 | 0.9% |
| 2 | 5 | 0.9% |
| 347082 | 5 | 0.9% |
| ston/o2 | 4 | 0.7% |
| Other values (402) | 480 |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 383 | |
| 1 | 349 | |
| 2 | 279 | |
| 7 | 250 | |
| 4 | 224 | 7.4% |
| 6 | 210 | 7.0% |
| 0 | 209 | 6.9% |
| 5 | 193 | 6.4% |
| 9 | 166 | 5.5% |
| 8 | 144 | 4.8% |
| Other values (25) | 607 |
| Value | Count | Frequency (%) |
| 3 | 381 | |
| 1 | 334 | |
| 2 | 285 | |
| 7 | 245 | 8.1% |
| 4 | 244 | 8.1% |
| 6 | 205 | 6.8% |
| 0 | 191 | 6.3% |
| 5 | 191 | 6.3% |
| 9 | 165 | 5.4% |
| 8 | 164 | 5.4% |
| Other values (22) | 623 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 3014 |
| Value | Count | Frequency (%) |
| (unknown) | 3028 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 3 | 383 | |
| 1 | 349 | |
| 2 | 279 | |
| 7 | 250 | |
| 4 | 224 | 7.4% |
| 6 | 210 | 7.0% |
| 0 | 209 | 6.9% |
| 5 | 193 | 6.4% |
| 9 | 166 | 5.5% |
| 8 | 144 | 4.8% |
| Other values (25) | 607 |
| Value | Count | Frequency (%) |
| 3 | 381 | |
| 1 | 334 | |
| 2 | 285 | |
| 7 | 245 | 8.1% |
| 4 | 244 | 8.1% |
| 6 | 205 | 6.8% |
| 0 | 191 | 6.3% |
| 5 | 191 | 6.3% |
| 9 | 165 | 5.4% |
| 8 | 164 | 5.4% |
| Other values (22) | 623 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 3014 |
| Value | Count | Frequency (%) |
| (unknown) | 3028 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 3 | 383 | |
| 1 | 349 | |
| 2 | 279 | |
| 7 | 250 | |
| 4 | 224 | 7.4% |
| 6 | 210 | 7.0% |
| 0 | 209 | 6.9% |
| 5 | 193 | 6.4% |
| 9 | 166 | 5.5% |
| 8 | 144 | 4.8% |
| Other values (25) | 607 |
| Value | Count | Frequency (%) |
| 3 | 381 | |
| 1 | 334 | |
| 2 | 285 | |
| 7 | 245 | 8.1% |
| 4 | 244 | 8.1% |
| 6 | 205 | 6.8% |
| 0 | 191 | 6.3% |
| 5 | 191 | 6.3% |
| 9 | 165 | 5.4% |
| 8 | 164 | 5.4% |
| Other values (22) | 623 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 3014 |
| Value | Count | Frequency (%) |
| (unknown) | 3028 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 3 | 383 | |
| 1 | 349 | |
| 2 | 279 | |
| 7 | 250 | |
| 4 | 224 | 7.4% |
| 6 | 210 | 7.0% |
| 0 | 209 | 6.9% |
| 5 | 193 | 6.4% |
| 9 | 166 | 5.5% |
| 8 | 144 | 4.8% |
| Other values (25) | 607 |
| Value | Count | Frequency (%) |
| 3 | 381 | |
| 1 | 334 | |
| 2 | 285 | |
| 7 | 245 | 8.1% |
| 4 | 244 | 8.1% |
| 6 | 205 | 6.8% |
| 0 | 191 | 6.3% |
| 5 | 191 | 6.3% |
| 9 | 165 | 5.4% |
| 8 | 164 | 5.4% |
| Other values (22) | 623 |
Fare
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 178 | 177 |
| Distinct (%) | 39.9% | 39.7% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 32.58066 | 31.42214 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 512.3292 | 512.3292 |
| Zeros | 6 | 9 |
| Zeros (%) | 1.3% | 2.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 7.225 | 7.225 |
| Q1 | 7.8958 | 7.8958 |
| median | 14.4542 | 14.4542 |
| Q3 | 30.6958 | 30.5 |
| 95-th percentile | 120 | 110.38748 |
| Maximum | 512.3292 | 512.3292 |
| Range | 512.3292 | 512.3292 |
| Interquartile range (IQR) | 22.8 | 22.6042 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 51.945576 | 49.948189 |
| Coefficient of variation (CV) | 1.5943684 | 1.5895858 |
| Kurtosis | 35.564148 | 41.011683 |
| Mean | 32.58066 | 31.42214 |
| Median Absolute Deviation (MAD) | 6.8042 | 6.7209 |
| Skewness | 5.0014039 | 5.3397644 |
| Sum | 14530.974 | 14014.274 |
| Variance | 2698.3429 | 2494.8216 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 7.75 | 21 | 4.7% |
| 7.8958 | 20 | 4.5% |
| 13 | 16 | 3.6% |
| 8.05 | 15 | 3.4% |
| 10.5 | 15 | 3.4% |
| 26 | 13 | 2.9% |
| 7.775 | 10 | 2.2% |
| 26.55 | 10 | 2.2% |
| 7.925 | 9 | 2.0% |
| 8.6625 | 8 | 1.8% |
| Other values (168) | 309 |
| Value | Count | Frequency (%) |
| 7.8958 | 28 | 6.3% |
| 8.05 | 24 | 5.4% |
| 7.75 | 20 | 4.5% |
| 13 | 19 | 4.3% |
| 10.5 | 12 | 2.7% |
| 26.55 | 11 | 2.5% |
| 0 | 9 | 2.0% |
| 8.6625 | 8 | 1.8% |
| 7.925 | 8 | 1.8% |
| 7.2292 | 8 | 1.8% |
| Other values (167) | 299 |
| Value | Count | Frequency (%) |
| 0 | 6 | |
| 4.0125 | 1 | 0.2% |
| 6.4958 | 1 | 0.2% |
| 6.75 | 1 | 0.2% |
| 6.95 | 1 | 0.2% |
| 6.975 | 1 | 0.2% |
| 7.05 | 5 | |
| 7.0542 | 2 | 0.4% |
| 7.125 | 2 | 0.4% |
| 7.1417 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 9 | |
| 4.0125 | 1 | 0.2% |
| 6.2375 | 1 | 0.2% |
| 6.4958 | 1 | 0.2% |
| 6.95 | 1 | 0.2% |
| 7.05 | 5 | |
| 7.0542 | 2 | 0.4% |
| 7.125 | 2 | 0.4% |
| 7.225 | 5 | |
| 7.2292 | 8 |
| Value | Count | Frequency (%) |
| 0 | 9 | |
| 4.0125 | 1 | 0.2% |
| 6.2375 | 1 | 0.2% |
| 6.4958 | 1 | 0.2% |
| 6.95 | 1 | 0.2% |
| 7.05 | 5 | |
| 7.0542 | 2 | 0.4% |
| 7.125 | 2 | 0.4% |
| 7.225 | 5 | |
| 7.2292 | 8 |
| Value | Count | Frequency (%) |
| 0 | 6 | |
| 4.0125 | 1 | 0.2% |
| 6.4958 | 1 | 0.2% |
| 6.75 | 1 | 0.2% |
| 6.95 | 1 | 0.2% |
| 6.975 | 1 | 0.2% |
| 7.05 | 5 | |
| 7.0542 | 2 | 0.4% |
| 7.125 | 2 | 0.4% |
| 7.1417 | 1 | 0.2% |
Cabin
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 82 | 79 |
| Distinct (%) | 81.2% | 85.9% |
| Missing | 345 | 354 |
| Missing (%) | 77.4% | 79.4% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 15 | 15 |
| Median length | 3 | 3 |
| Mean length | 3.6633663 | 3.5434783 |
| Min length | 1 | 1 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 68 | 67 ? |
| Unique (%) | 67.3% | 72.8% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | F33 | E77 |
| 2nd row | E44 | D20 |
| 3rd row | D11 | D17 |
| 4th row | F4 | D9 |
| 5th row | A14 | C125 |
| Value | Count | Frequency (%) |
| b96 | 4 | 3.3% |
| b98 | 4 | 3.3% |
| c22 | 3 | 2.5% |
| c26 | 3 | 2.5% |
| c23 | 3 | 2.5% |
| c25 | 3 | 2.5% |
| c27 | 3 | 2.5% |
| g6 | 3 | 2.5% |
| b28 | 2 | 1.7% |
| c52 | 2 | 1.7% |
| Other values (81) | 90 |
| Value | Count | Frequency (%) |
| c22 | 3 | 2.8% |
| c26 | 3 | 2.8% |
| d17 | 2 | 1.9% |
| d20 | 2 | 1.9% |
| e67 | 2 | 1.9% |
| g6 | 2 | 1.9% |
| d33 | 2 | 1.9% |
| c124 | 2 | 1.9% |
| b49 | 2 | 1.9% |
| b96 | 2 | 1.9% |
| Other values (80) | 84 |
Most occurring characters
| Value | Count | Frequency (%) |
| 2 | 47 | |
| C | 39 | 10.5% |
| 6 | 29 | 7.8% |
| B | 29 | 7.8% |
| 1 | 28 | 7.6% |
| 3 | 21 | 5.7% |
| 8 | 20 | 5.4% |
| 7 | 20 | 5.4% |
| 19 | 5.1% | |
| 4 | 18 | 4.9% |
| Other values (8) | 100 |
| Value | Count | Frequency (%) |
| 2 | 37 | |
| C | 34 | 10.4% |
| B | 30 | 9.2% |
| 6 | 25 | 7.7% |
| 1 | 24 | 7.4% |
| 5 | 19 | 5.8% |
| 3 | 19 | 5.8% |
| 7 | 18 | 5.5% |
| 8 | 17 | 5.2% |
| E | 17 | 5.2% |
| Other values (9) | 86 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 370 |
| Value | Count | Frequency (%) |
| (unknown) | 326 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 2 | 47 | |
| C | 39 | 10.5% |
| 6 | 29 | 7.8% |
| B | 29 | 7.8% |
| 1 | 28 | 7.6% |
| 3 | 21 | 5.7% |
| 8 | 20 | 5.4% |
| 7 | 20 | 5.4% |
| 19 | 5.1% | |
| 4 | 18 | 4.9% |
| Other values (8) | 100 |
| Value | Count | Frequency (%) |
| 2 | 37 | |
| C | 34 | 10.4% |
| B | 30 | 9.2% |
| 6 | 25 | 7.7% |
| 1 | 24 | 7.4% |
| 5 | 19 | 5.8% |
| 3 | 19 | 5.8% |
| 7 | 18 | 5.5% |
| 8 | 17 | 5.2% |
| E | 17 | 5.2% |
| Other values (9) | 86 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 370 |
| Value | Count | Frequency (%) |
| (unknown) | 326 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 2 | 47 | |
| C | 39 | 10.5% |
| 6 | 29 | 7.8% |
| B | 29 | 7.8% |
| 1 | 28 | 7.6% |
| 3 | 21 | 5.7% |
| 8 | 20 | 5.4% |
| 7 | 20 | 5.4% |
| 19 | 5.1% | |
| 4 | 18 | 4.9% |
| Other values (8) | 100 |
| Value | Count | Frequency (%) |
| 2 | 37 | |
| C | 34 | 10.4% |
| B | 30 | 9.2% |
| 6 | 25 | 7.7% |
| 1 | 24 | 7.4% |
| 5 | 19 | 5.8% |
| 3 | 19 | 5.8% |
| 7 | 18 | 5.5% |
| 8 | 17 | 5.2% |
| E | 17 | 5.2% |
| Other values (9) | 86 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 370 |
| Value | Count | Frequency (%) |
| (unknown) | 326 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 2 | 47 | |
| C | 39 | 10.5% |
| 6 | 29 | 7.8% |
| B | 29 | 7.8% |
| 1 | 28 | 7.6% |
| 3 | 21 | 5.7% |
| 8 | 20 | 5.4% |
| 7 | 20 | 5.4% |
| 19 | 5.1% | |
| 4 | 18 | 4.9% |
| Other values (8) | 100 |
| Value | Count | Frequency (%) |
| 2 | 37 | |
| C | 34 | 10.4% |
| B | 30 | 9.2% |
| 6 | 25 | 7.7% |
| 1 | 24 | 7.4% |
| 5 | 19 | 5.8% |
| 3 | 19 | 5.8% |
| 7 | 18 | 5.5% |
| 8 | 17 | 5.2% |
| E | 17 | 5.2% |
| Other values (9) | 86 |
Embarked
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 3 | 3 |
| Distinct (%) | 0.7% | 0.7% |
| Missing | 2 | 1 |
| Missing (%) | 0.4% | 0.2% |
| Memory size | 7.0 KiB | 7.0 KiB |
| S | |
|---|---|
| C | |
| Q |
| S | |
|---|---|
| C | |
| Q |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | S | Q |
| 2nd row | C | S |
| 3rd row | S | S |
| 4th row | S | C |
| 5th row | S | S |
Common Values
| Value | Count | Frequency (%) |
| S | 325 | |
| C | 74 | 16.6% |
| Q | 45 | 10.1% |
| (Missing) | 2 | 0.4% |
| Value | Count | Frequency (%) |
| S | 334 | |
| C | 74 | 16.6% |
| Q | 37 | 8.3% |
| (Missing) | 1 | 0.2% |
Length
Histogram of lengths of the category
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| s | 325 | |
| c | 74 | 16.7% |
| q | 45 | 10.1% |
| Value | Count | Frequency (%) |
| s | 334 | |
| c | 74 | 16.6% |
| q | 37 | 8.3% |
Most occurring characters
| Value | Count | Frequency (%) |
| S | 325 | |
| C | 74 | 16.7% |
| Q | 45 | 10.1% |
| Value | Count | Frequency (%) |
| S | 334 | |
| C | 74 | 16.6% |
| Q | 37 | 8.3% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 444 |
| Value | Count | Frequency (%) |
| (unknown) | 445 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| S | 325 | |
| C | 74 | 16.7% |
| Q | 45 | 10.1% |
| Value | Count | Frequency (%) |
| S | 334 | |
| C | 74 | 16.6% |
| Q | 37 | 8.3% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 444 |
| Value | Count | Frequency (%) |
| (unknown) | 445 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| S | 325 | |
| C | 74 | 16.7% |
| Q | 45 | 10.1% |
| Value | Count | Frequency (%) |
| S | 334 | |
| C | 74 | 16.6% |
| Q | 37 | 8.3% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 444 |
| Value | Count | Frequency (%) |
| (unknown) | 445 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| S | 325 | |
| C | 74 | 16.7% |
| Q | 45 | 10.1% |
| Value | Count | Frequency (%) |
| S | 334 | |
| C | 74 | 16.6% |
| Q | 37 | 8.3% |
Interactions
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Correlations
Dataset A
Dataset B
Dataset A
| Age | Embarked | Fare | Parch | PassengerId | Pclass | Sex | SibSp | Survived | |
|---|---|---|---|---|---|---|---|---|---|
| Age | 1.000 | 0.000 | 0.147 | -0.309 | 0.098 | 0.277 | 0.089 | -0.144 | 0.233 |
| Embarked | 0.000 | 1.000 | 0.162 | 0.085 | 0.000 | 0.258 | 0.039 | 0.000 | 0.145 |
| Fare | 0.147 | 0.162 | 1.000 | 0.436 | 0.012 | 0.480 | 0.155 | 0.451 | 0.283 |
| Parch | -0.309 | 0.085 | 0.436 | 1.000 | -0.014 | 0.000 | 0.198 | 0.468 | 0.165 |
| PassengerId | 0.098 | 0.000 | 0.012 | -0.014 | 1.000 | 0.000 | 0.056 | -0.084 | 0.096 |
| Pclass | 0.277 | 0.258 | 0.480 | 0.000 | 0.000 | 1.000 | 0.145 | 0.124 | 0.343 |
| Sex | 0.089 | 0.039 | 0.155 | 0.198 | 0.056 | 0.145 | 1.000 | 0.146 | 0.552 |
| SibSp | -0.144 | 0.000 | 0.451 | 0.468 | -0.084 | 0.124 | 0.146 | 1.000 | 0.130 |
| Survived | 0.233 | 0.145 | 0.283 | 0.165 | 0.096 | 0.343 | 0.552 | 0.130 | 1.000 |
Dataset B
| Age | Embarked | Fare | Parch | PassengerId | Pclass | Sex | SibSp | Survived | |
|---|---|---|---|---|---|---|---|---|---|
| Age | 1.000 | 0.181 | 0.120 | -0.242 | 0.072 | 0.277 | 0.072 | -0.164 | 0.154 |
| Embarked | 0.181 | 1.000 | 0.200 | 0.000 | 0.000 | 0.219 | 0.144 | 0.073 | 0.219 |
| Fare | 0.120 | 0.200 | 1.000 | 0.442 | -0.004 | 0.465 | 0.197 | 0.523 | 0.311 |
| Parch | -0.242 | 0.000 | 0.442 | 1.000 | 0.016 | 0.000 | 0.312 | 0.472 | 0.149 |
| PassengerId | 0.072 | 0.000 | -0.004 | 0.016 | 1.000 | 0.000 | 0.000 | -0.047 | 0.087 |
| Pclass | 0.277 | 0.219 | 0.465 | 0.000 | 0.000 | 1.000 | 0.062 | 0.179 | 0.357 |
| Sex | 0.072 | 0.144 | 0.197 | 0.312 | 0.000 | 0.062 | 1.000 | 0.294 | 0.495 |
| SibSp | -0.164 | 0.073 | 0.523 | 0.472 | -0.047 | 0.179 | 0.294 | 1.000 | 0.218 |
| Survived | 0.154 | 0.219 | 0.311 | 0.149 | 0.087 | 0.357 | 0.495 | 0.218 | 1.000 |
Missing values
Dataset A
A simple visualization of nullity by column.
Dataset B
A simple visualization of nullity by column.
Dataset A
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
Dataset B
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
Dataset A
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
Dataset B
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
Sample
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 516 | 517 | 1 | 2 | Lemore, Mrs. (Amelia Milley) | female | 34.0 | 0 | 0 | C.A. 34260 | 10.5000 | F33 | S |
| 691 | 692 | 1 | 3 | Karun, Miss. Manca | female | 4.0 | 0 | 1 | 349256 | 13.4167 | NaN | C |
| 535 | 536 | 1 | 2 | Hart, Miss. Eva Miriam | female | 7.0 | 0 | 2 | F.C.C. 13529 | 26.2500 | NaN | S |
| 494 | 495 | 0 | 3 | Stanley, Mr. Edward Roland | male | 21.0 | 0 | 0 | A/4 45380 | 8.0500 | NaN | S |
| 212 | 213 | 0 | 3 | Perkin, Mr. John Henry | male | 22.0 | 0 | 0 | A/5 21174 | 7.2500 | NaN | S |
| 398 | 399 | 0 | 2 | Pain, Dr. Alfred | male | 23.0 | 0 | 0 | 244278 | 10.5000 | NaN | S |
| 817 | 818 | 0 | 2 | Mallet, Mr. Albert | male | 31.0 | 1 | 1 | S.C./PARIS 2079 | 37.0042 | NaN | C |
| 885 | 886 | 0 | 3 | Rice, Mrs. William (Margaret Norton) | female | 39.0 | 0 | 5 | 382652 | 29.1250 | NaN | Q |
| 107 | 108 | 1 | 3 | Moss, Mr. Albert Johan | male | NaN | 0 | 0 | 312991 | 7.7750 | NaN | S |
| 576 | 577 | 1 | 2 | Garside, Miss. Ethel | female | 34.0 | 0 | 0 | 243880 | 13.0000 | NaN | S |
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 156 | 157 | 1 | 3 | Gilnagh, Miss. Katherine "Katie" | female | 16.0 | 0 | 0 | 35851 | 7.7333 | NaN | Q |
| 221 | 222 | 0 | 2 | Bracken, Mr. James H | male | 27.0 | 0 | 0 | 220367 | 13.0000 | NaN | S |
| 772 | 773 | 0 | 2 | Mack, Mrs. (Mary) | female | 57.0 | 0 | 0 | S.O./P.P. 3 | 10.5000 | E77 | S |
| 874 | 875 | 1 | 2 | Abelson, Mrs. Samuel (Hannah Wizosky) | female | 28.0 | 1 | 0 | P/PP 3381 | 24.0000 | NaN | C |
| 385 | 386 | 0 | 2 | Davies, Mr. Charles Henry | male | 18.0 | 0 | 0 | S.O.C. 14879 | 73.5000 | NaN | S |
| 695 | 696 | 0 | 2 | Chapman, Mr. Charles Henry | male | 52.0 | 0 | 0 | 248731 | 13.5000 | NaN | S |
| 570 | 571 | 1 | 2 | Harris, Mr. George | male | 62.0 | 0 | 0 | S.W./PP 752 | 10.5000 | NaN | S |
| 787 | 788 | 0 | 3 | Rice, Master. George Hugh | male | 8.0 | 4 | 1 | 382652 | 29.1250 | NaN | Q |
| 593 | 594 | 0 | 3 | Bourke, Miss. Mary | female | NaN | 0 | 2 | 364848 | 7.7500 | NaN | Q |
| 157 | 158 | 0 | 3 | Corn, Mr. Harry | male | 30.0 | 0 | 0 | SOTON/OQ 392090 | 8.0500 | NaN | S |
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 887 | 888 | 1 | 1 | Graham, Miss. Margaret Edith | female | 19.0 | 0 | 0 | 112053 | 30.0000 | B42 | S |
| 373 | 374 | 0 | 1 | Ringhini, Mr. Sante | male | 22.0 | 0 | 0 | PC 17760 | 135.6333 | NaN | C |
| 172 | 173 | 1 | 3 | Johnson, Miss. Eleanor Ileen | female | 1.0 | 1 | 1 | 347742 | 11.1333 | NaN | S |
| 308 | 309 | 0 | 2 | Abelson, Mr. Samuel | male | 30.0 | 1 | 0 | P/PP 3381 | 24.0000 | NaN | C |
| 627 | 628 | 1 | 1 | Longley, Miss. Gretchen Fiske | female | 21.0 | 0 | 0 | 13502 | 77.9583 | D9 | S |
| 502 | 503 | 0 | 3 | O'Sullivan, Miss. Bridget Mary | female | NaN | 0 | 0 | 330909 | 7.6292 | NaN | Q |
| 448 | 449 | 1 | 3 | Baclini, Miss. Marie Catherine | female | 5.0 | 2 | 1 | 2666 | 19.2583 | NaN | C |
| 468 | 469 | 0 | 3 | Scanlan, Mr. James | male | NaN | 0 | 0 | 36209 | 7.7250 | NaN | Q |
| 617 | 618 | 0 | 3 | Lobb, Mrs. William Arthur (Cordelia K Stanlick) | female | 26.0 | 1 | 0 | A/5. 3336 | 16.1000 | NaN | S |
| 766 | 767 | 0 | 1 | Brewe, Dr. Arthur Jackson | male | NaN | 0 | 0 | 112379 | 39.6000 | NaN | C |
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 71 | 72 | 0 | 3 | Goodwin, Miss. Lillian Amy | female | 16.0 | 5 | 2 | CA 2144 | 46.9000 | NaN | S |
| 279 | 280 | 1 | 3 | Abbott, Mrs. Stanton (Rosa Hunt) | female | 35.0 | 1 | 1 | C.A. 2673 | 20.2500 | NaN | S |
| 476 | 477 | 0 | 2 | Renouf, Mr. Peter Henry | male | 34.0 | 1 | 0 | 31027 | 21.0000 | NaN | S |
| 494 | 495 | 0 | 3 | Stanley, Mr. Edward Roland | male | 21.0 | 0 | 0 | A/4 45380 | 8.0500 | NaN | S |
| 633 | 634 | 0 | 1 | Parr, Mr. William Henry Marsh | male | NaN | 0 | 0 | 112052 | 0.0000 | NaN | S |
| 617 | 618 | 0 | 3 | Lobb, Mrs. William Arthur (Cordelia K Stanlick) | female | 26.0 | 1 | 0 | A/5. 3336 | 16.1000 | NaN | S |
| 226 | 227 | 1 | 2 | Mellors, Mr. William John | male | 19.0 | 0 | 0 | SW/PP 751 | 10.5000 | NaN | S |
| 296 | 297 | 0 | 3 | Hanna, Mr. Mansour | male | 23.5 | 0 | 0 | 2693 | 7.2292 | NaN | C |
| 876 | 877 | 0 | 3 | Gustafsson, Mr. Alfred Ossian | male | 20.0 | 0 | 0 | 7534 | 9.8458 | NaN | S |
| 737 | 738 | 1 | 1 | Lesurer, Mr. Gustave J | male | 35.0 | 0 | 0 | PC 17755 | 512.3292 | B101 | C |
Duplicate rows
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset does not contain duplicate rows. | |||||||||||||
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset does not contain duplicate rows. | |||||||||||||